Pubmed citation network


In [1]:
import networkx as nx
import numpy as np
import pickle as p
from scipy.sparse import csr_matrix
from matplotlib import pyplot as plt
%matplotlib inline

data_loc = './../data/raw/pubmed/'

The Pubmed datasets consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes:

  • Diabetes Mellitus, Experimental
  • Diabetes Mellitus Type 1
  • Diabetes Mellitus Type 2

The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.


In [ ]: